A Recursive Annotation Scheme for Referential Information Status
نویسندگان
چکیده
We provide a robust and detailed annotation scheme for information status, which is easy to use, follows a semantic rather than cognitive motivation, and achieves reasonable inter-annotator scores. Our annotation scheme is based on two main assumptions: firstly, that information status strongly depends on (in)definiteness, and secondly, that it ought to be understood as a property of referents rather than words. Therefore, our scheme banks on overt (in)definiteness marking and provides different categories for each class. Definites are grouped according to the information source by which the referent is identified. A special aspect of the scheme is that non-anaphoric expressions (e.g. names) are classified as to whether their referents are likely to be known or unknown to an expected audience. The annotation scheme provides a solution for annotating complex nominal expressions which may recursively contain embedded expressions. In annotating a corpus of German radio news bulletins, a kappa score of .66 for the full scheme was achieved, a core scheme of six top-level categories yields κ = .78.
منابع مشابه
Arndt Riester and Stefan Baumann The RefLex Scheme – Annotation Guidelines
The purpose of the RefLex annotation scheme (Baumann and Riester 2012) is the two-dimensional analysis of textual or spoken corpus data with regard to referential information status (including coreference and bridging) as well as lexical information status (semantic relations). We provide some linguisticphilosophical background followed by detailed guidelines, which can be used in combination w...
متن کاملCoreference , Lexical Givenness and Prosody in German
In this article we discuss some empirical results concerning the impact of different levels of information status (i.e. referents and words, respectively) on the prosodic realization of referential expressions in annotated corpora of read and spontaneous speech. Both at the referential and at the lexical level not only given and new but also intermediate classes of givenness/novelty have to be ...
متن کاملReferential and Lexical Givenness:
The main objective of the paper is to show that for an adequate analysis of an item’s information status in spoken language two levels of givenness have to be investigated: a referential and a lexical level. This separation is a crucial step towards our goal to arrive at the best possible classification of nominal expressions occurring in natural discourse which reflects our understanding of al...
متن کاملA Unified Representation For Morphological, Syntactic, Semantic, And Referential Annotations
This paper reports on the SYN-RA (SYNtax-based Reference Annotation) project, an on-going project of annotating German newspaper texts with referential relations. The project has developed an inventory of anaphoric and coreference relations for German in the context of a unified, XML-based annotation scheme for combining morphological, syntactic, semantic, and anaphoric information. The paper d...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010